Power of GWAS
using linear mixed models


Andrey Ziyatdinov

July 5, 2018
Institut Pasteur, Statistical Genetics Group

Outline

  1. GWAS and Linear Mixed Models
  2. Our aims
  3. Analytical Results for Power
  4. Conclusions

Genome-wide association study (GWAS)

Association of phenotypic variation with common genetic variants across the genome (single nucleotide polymorphisms, SNPs)

  • The HapMap Project (International HapMap Consortium, 2003)
  • First well-designed GWAS paper (Wellcome Trust Case Control Consortium, 2007)
  • Commercial genotyping arrays at low cost
id outcome (\(y\))
1 1.4
2 -3.2
3 1.5
.. …
N -3.1




\(\sim\)

id SNP 1 (\(X_1\)) SNP 2 (\(X_2\)) … SNP M (\(X_M\))
1 0 2 … 1
2 1 1 … 0
3 1 0 … 2
.. … … … …
N 2 1 … 1

Agnostic search & the simplest strategy. For every SNP \(i\):

  • linear regression: \(y \sim (\mu + \beta_i X_i, \sigma^2_r I)\)
  • \(r = cor(y, X_i)\) ⟶ \(Z = \sqrt{N - 2} \frac{r}{\sqrt{1 - r^2}} \approx \sqrt{N} r \simeq \mathcal{N}(0, 1)\)

In GWAS we trust

  • Simple
  • Fast
  • Robust
  • Reproducible →
  • Predictable ↴

Statistical power depends on Non-Centrity Paramepter (NCP) of a test for SNP with given MAF (\(p\)), effect size (\(\beta\)) in \(N\) unrelated individuals and using linear models (LM):

\[NCP = N q^2 = N \beta^2 [2 p (1 - p)]\]

GWAS using Linear Mixed Models (LMM) vs LM

\(var(y) = \sum{\sigma_i^2 R_i} + \sigma_r^2 I\) ⟶ LMM

  1. Testing main genetic effect (GWAS)
Unrelated Individuals (LM) Related Individuals (LMM)
\(y \sim (\mu + \beta x , \mbox{ } \sigma^2_r I)\) \(y \sim (\mu + \beta x , \mbox{ } \sum{\sigma_i^2 R_i} + \sigma_r^2 I)\)
\(NCP = N q^2 = N \beta^2 [2 p (1 - p)]\) \(NCP =\) ?

  1. Testing GxE interaction effect (GWAI)
Unrelated Individuals (LM) Related Individuals (LMM)
\(y \sim (\mu + \beta x + d \alpha + \delta \mbox{ } x*d , \mbox{ } \sigma^2_r I)\) \(y \sim (\mu + \beta x + d \alpha + \delta \mbox{ } x*d , \mbox{ } \sum{\sigma_i^2 R_i} + \sigma_r^2 I)\)
\(NCP = N q^2 w^2= N \delta^2 [2 p (1 - p) f (1 - f)]\) \(NCP =\) ?


Flavours of Relatedness

Related individuals

  • Genetically Related, e.g. Families or Cryptic Relatedness
  • Grouping, e.g. medical center

(Genetically) Unrelated individuals

  • The infinitesimal model of genetic architecture
    (increased GWAS power due to modeling polygenic effect; Yang et al., 2014)


Generative model Association model
\(y \sim (\mu + \sum_{i=1}^{M} \beta_i X_i, \sigma^2_r I)\) \(\beta_i \sim \mathcal{N}(0, \sigma^2_{\beta})\) \(y \sim (\mu + \beta_k X_k, \sigma_m^2 M_{-k} + \sigma^2_r I)\) \(M_{-k} = X X^T / (M-1)\)

Modeling relatedness and GxE is important
for biobank-scale datasets, e.g. UKBiobank

  • Inclusion of genetically related individuals empowers GWAS (Loh et al., 2018)
  • The wealth of collected environmental exposures has potential to uncover GxE interactions (Young et al., 2016)

Questions we ask

Which study design is more powerfull?
(for a given SNP with MAF (\(p\)), effect size (\(\beta\)) and sample size \(N\))

  1. Unrelated or families?
    • testing genetic effect of SNP (Visscher et al., 2008)
    • testing GxE interaction?
  1. Quantify gain in power by the infinitesimal model using LMM over LM (Yang et al., 2014)

Results

Derivation of NCP for LMM: fitting LMM

\(y \sim (\mu + \beta x, \sum{\sigma_i^2 R_i} + \sigma_r^2 I) = (\mu + \beta x, V)\)

  1. Estimate variance components by ML/REML

\(\hat{V} = \sum{\hat{\sigma}_i^2 R_i} + \hat{\sigma}_r^2 I\)

  1. Estimate effect sizes by GLS

\(\hat{\beta} = (x^T \hat{V}^{-1} x)^{-1} x^T \hat{V}^{-1} y\)

\(var(\hat{\beta}) = (x^T \hat{V}^{-1} x)^{-1}\)

Derivation of NCP for LMM: approximation of \(var(\hat{\beta})\)

\(NCP = \hat{\beta}^2 / var(\hat{\beta}); var(\hat{\beta}) = [x^T \hat{V}^{-1} x]^{-1} \approx [E(x^T \hat{V}^{-1} x]^{-1} \approx [trace(\hat{V}^{-1} \Sigma_x)]^{-1}\)

Approximation using the quadratic forms
If \(x\) is a vector of random variables, the quadratic form \(x^TAx\) is a scalar random variable.
If \(x\) has mean \(\mu\) and (nonsingular) covariance matrix \(V\), then

\(E(x^TAx) = tr(AV) + \mu^T A \mu\)
\(\sigma^2(x^TAx) = 2tr(AVAV) + 4\mu AVA \mu\)

(Lynch and Walsh, 1998)

For unrelated individuals & testing main genetic effect:
\(\hat{V} = \hat{\sigma}_r^2 I\) and \(\Sigma_x = \sigma_x^2 I = 2 p (1 - p) I\)

  • \(var(\hat{\beta}) = [trace(V^{-1} \Sigma_x)]^{-1} = [N \mbox{ } 2 p (1 - p) / \hat{\sigma}_r^2]^{-1}\)
  • \(NCP = \hat{\beta}^2 / var(\hat{\beta}) = N \hat{\beta}^2 2 p (1 - p) / \hat{\sigma}_r^2\)

Power comparison to detect main genetic effect

Study Model \(y \sim (X \beta , V)\) Genotype \(x_g\) \(NCP\) TF
Unrelated \(y \sim (\mu + \beta_g x_g, \sigma_r^2 I)\) \(x_g \sim (\mu_g, \sigma_g^2 I)\) \([N / \sigma_r^2] \beta_g^2 \sigma_g^2\) 1
Families \(y \sim (\mu + \beta_g x_g, \sigma_k^2 K + \sigma_r^2 I)\) \(x_g \sim (\mu_g, \sigma_g^2 K)\) \([tr((\sigma_k^2 K + \sigma_r^2 I)^{-1} K)] \beta_g^2 \sigma_g^2\) 0.8253

Power comparison to detect main genetic effect

Study Model \(y \sim (X \beta , V)\) Genotype \(x_g\) \(NCP\) TF
Unrelated \(y \sim (\mu + \beta_g x_g, \sigma_r^2 I)\) \(x_g \sim (\mu_g, \sigma_g^2 I)\) \([N / \sigma_r^2] \beta_g^2 \sigma_g^2\) 1
Unrelated +Grouping \(y \sim (\mu + \beta_g x_g, \sigma_h^2 H + \sigma_r^2 I)\) \(g \sim (\mu_g, \sigma_g^2 I)\) \([tr((\sigma_h^2 H + \sigma_r^2 I)^{-1})] \beta_g^2 \sigma_g^2\) >1

Power comparison to detect GxE interaction effect

Study Model \(y \sim (X \beta , V)\) Genotype \(x_g\) \(NCP\) TF
Unrelated \(y \sim (\mu + ... + \delta x_{int}, \sigma_r^2 I)\) \(x_{int} \sim (\mu_{int}, \sigma_{int}^2 I)\) \([N / \sigma_r^2] \delta^2 \sigma_g^2 \sigma_d^2\) 1
Families \(y \sim (\mu + ... + \delta x_{int}, \sigma_k^2 K + \sigma_r^2 I)\) \(x_{int} \sim (\mu_{int}, \sigma_{int}^2 K_{int})\) \([tr((\sigma_k^2 K + \sigma_r^2 I)^{-1} K_{int})] \delta^2 \sigma_g^2 \sigma_d^2\) 1.003-8.043

Conclusions

Conclusions

Testing main genetic effect (GWAS)

  1. Study designs: unrelated \(\approx\) genetically related
  2. The power increases as more variance is explained
    • by taking into account any relevant grouping
    • the infinitesimal model of genetic architecture

Testing GxE interaction effect (GWAI)

  1. Study designs: genetically related \(>\) unrelated
    • if exposure \(d\) and genotype \(x\) are independent


Applications

  • Power calculation for study designs where LMM are applied
  • Plan and optimize family-based designs for GWAI (GxE)
  • Propose to estimate \(N_{eff}\) size (Loh et al., 2018) using \([tr((\sigma_k^2 K + \sigma_r^2 I)^{-1}) K]\)
    • Post-GWAS methods: LDSC, fine-mapping, meta-analysis, JASS, etc

Thank you!